Hipotezės (planuotos):
pip install nba_api
Collecting nba_api Downloading nba_api-1.1.11.tar.gz (125 kB) Requirement already satisfied: requests in c:\users\vytis\anaconda3\lib\site-packages (from nba_api) (2.26.0) Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\vytis\anaconda3\lib\site-packages (from requests->nba_api) (1.26.7) Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\vytis\anaconda3\lib\site-packages (from requests->nba_api) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in c:\users\vytis\anaconda3\lib\site-packages (from requests->nba_api) (3.2) Requirement already satisfied: certifi>=2017.4.17 in c:\users\vytis\anaconda3\lib\site-packages (from requests->nba_api) (2021.10.8) Building wheels for collected packages: nba-api Building wheel for nba-api (setup.py): started Building wheel for nba-api (setup.py): finished with status 'done' Created wheel for nba-api: filename=nba_api-1.1.11-py3-none-any.whl size=251504 sha256=bdad5891ebe83844172f6272cd65203f4d87590ca345f6aee30557126dcc3b19 Stored in directory: c:\users\vytis\appdata\local\pip\cache\wheels\66\c2\3b\c87a243f9e5d2449e7f2c7bd65de4a6b5ce9a24b33978398a7 Successfully built nba-api Installing collected packages: nba-api Successfully installed nba-api-1.1.11 Note: you may need to restart the kernel to use updated packages.
Tai opensource package, kuris suteikia galimybe prieiti prie nba.com puslapyje esančių duomenų, išvengiant ilgo kodo rašymo.
import numpy as np
from nba_api.stats.endpoints import shotchartdetail
from nba_api.stats.endpoints import commonallplayers
from nba_api.stats.endpoints import playercareerstats
from nba_api.stats.endpoints import teamyearbyyearstats
from nba_api.stats.static import teams
from nba_api.stats.endpoints import playerawards
import time
import json
import requests
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import pickle
import seaborn as sns
from tqdm import tqdm
players_stats = pd.read_pickle('C:\\Users\\Vytis\\player_stats.pkl')
players_stats
PLAYER_ID | SEASON_ID | LEAGUE_ID | TEAM_ID | TEAM_ABBREVIATION | PLAYER_AGE | GP | GS | MIN | FGM | ... | FT_PCT | OREB | DREB | REB | AST | STL | BLK | TOV | PF | PTS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 51 | 1990-91 | 00 | 1610612743 | DEN | 22.0 | 67 | 19 | 1505.0 | 417 | ... | 0.857 | 34 | 87 | 121 | 206 | 55 | 4 | 110 | 149 | 942 |
1 | 51 | 1991-92 | 00 | 1610612743 | DEN | 23.0 | 81 | 11 | 1538.0 | 356 | ... | 0.870 | 22 | 92 | 114 | 192 | 44 | 4 | 117 | 130 | 837 |
2 | 51 | 1992-93 | 00 | 1610612743 | DEN | 24.0 | 81 | 81 | 2710.0 | 633 | ... | 0.935 | 51 | 174 | 225 | 344 | 84 | 8 | 187 | 179 | 1553 |
3 | 51 | 1993-94 | 00 | 1610612743 | DEN | 25.0 | 80 | 78 | 2617.0 | 588 | ... | 0.956 | 27 | 141 | 168 | 362 | 82 | 10 | 151 | 150 | 1437 |
4 | 51 | 1994-95 | 00 | 1610612743 | DEN | 26.0 | 73 | 43 | 2082.0 | 472 | ... | 0.885 | 32 | 105 | 137 | 263 | 77 | 9 | 119 | 126 | 1165 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
17969 | 1627826 | 2018-19 | 00 | 1610612746 | LAC | 22.0 | 26 | 25 | 524.0 | 100 | ... | 0.733 | 61 | 139 | 200 | 38 | 10 | 24 | 37 | 64 | 244 |
17970 | 1627826 | 2018-19 | 00 | 0 | TOT | 22.0 | 59 | 37 | 1039.0 | 212 | ... | 0.802 | 115 | 247 | 362 | 63 | 14 | 51 | 70 | 137 | 525 |
17971 | 1627826 | 2019-20 | 00 | 1610612746 | LAC | 23.0 | 72 | 70 | 1326.0 | 236 | ... | 0.747 | 197 | 346 | 543 | 82 | 16 | 66 | 61 | 168 | 596 |
17972 | 1627826 | 2020-21 | 00 | 1610612746 | LAC | 24.0 | 72 | 33 | 1609.0 | 257 | ... | 0.789 | 189 | 330 | 519 | 90 | 24 | 62 | 81 | 187 | 650 |
17973 | 1627826 | 2021-22 | 00 | 1610612746 | LAC | 24.0 | 35 | 35 | 876.0 | 126 | ... | 0.737 | 95 | 193 | 288 | 38 | 19 | 41 | 48 | 80 | 336 |
17974 rows × 27 columns
players_stats.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 17974 entries, 0 to 17973 Data columns (total 27 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PLAYER_ID 17974 non-null object 1 SEASON_ID 17974 non-null object 2 LEAGUE_ID 17974 non-null object 3 TEAM_ID 17974 non-null object 4 TEAM_ABBREVIATION 17974 non-null object 5 PLAYER_AGE 17974 non-null float64 6 GP 17974 non-null object 7 GS 17974 non-null object 8 MIN 17974 non-null float64 9 FGM 17974 non-null object 10 FGA 17974 non-null object 11 FG_PCT 17974 non-null float64 12 FG3M 17966 non-null object 13 FG3A 17966 non-null object 14 FG3_PCT 17966 non-null float64 15 FTM 17974 non-null object 16 FTA 17974 non-null object 17 FT_PCT 17974 non-null float64 18 OREB 17974 non-null object 19 DREB 17974 non-null object 20 REB 17974 non-null object 21 AST 17974 non-null object 22 STL 17974 non-null object 23 BLK 17974 non-null object 24 TOV 17973 non-null object 25 PF 17974 non-null object 26 PTS 17974 non-null object dtypes: float64(5), object(22) memory usage: 3.7+ MB
# Man svarbu FG3M, FG3A, FG3_PCT
players_stats['FG3M']
0 24 1 31 2 70 3 42 4 83 .. 17969 0 17970 0 17971 0 17972 1 17973 0 Name: FG3M, Length: 17974, dtype: object
players_stats['FG3A']
0 100 1 94 2 197 3 133 4 215 ... 17969 0 17970 0 17971 2 17972 4 17973 0 Name: FG3A, Length: 17974, dtype: object
players_stats['FG3_PCT']
0 0.240 1 0.330 2 0.355 3 0.316 4 0.386 ... 17969 0.000 17970 0.000 17971 0.000 17972 0.250 17973 0.000 Name: FG3_PCT, Length: 17974, dtype: float64
players_stats['SEASON_ID'] = players_stats['SEASON_ID'].map(lambda x: int(x.split("-",1)[0]))
players_stats['SEASON_ID']
0 1990 1 1991 2 1992 3 1993 4 1994 ... 17969 2018 17970 2018 17971 2019 17972 2020 17973 2021 Name: SEASON_ID, Length: 17974, dtype: int64
# Prisijungsiu krepšininkų vardus, kad nebūtų tik ID
all_players = commonallplayers.CommonAllPlayers().get_data_frames()[0]
all_players
PERSON_ID | DISPLAY_LAST_COMMA_FIRST | DISPLAY_FIRST_LAST | ROSTERSTATUS | FROM_YEAR | TO_YEAR | PLAYERCODE | PLAYER_SLUG | TEAM_ID | TEAM_CITY | TEAM_NAME | TEAM_ABBREVIATION | TEAM_CODE | TEAM_SLUG | GAMES_PLAYED_FLAG | OTHERLEAGUE_EXPERIENCE_CH | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 76001 | Abdelnaby, Alaa | Alaa Abdelnaby | 0 | 1990 | 1994 | HISTADD_alaa_abdelnaby | alaa_abdelnaby | 0 | None | Y | 00 | ||||
1 | 76002 | Abdul-Aziz, Zaid | Zaid Abdul-Aziz | 0 | 1968 | 1977 | HISTADD_zaid_abdul-aziz | zaid_abdul-aziz | 0 | None | Y | 00 | ||||
2 | 76003 | Abdul-Jabbar, Kareem | Kareem Abdul-Jabbar | 0 | 1969 | 1988 | HISTADD_kareem_abdul-jabbar | kareem_abdul-jabbar | 0 | None | Y | 00 | ||||
3 | 51 | Abdul-Rauf, Mahmoud | Mahmoud Abdul-Rauf | 0 | 1990 | 2000 | mahmoud_abdul-rauf | mahmoud_abdul-rauf | 0 | None | Y | 00 | ||||
4 | 1505 | Abdul-Wahad, Tariq | Tariq Abdul-Wahad | 0 | 1997 | 2003 | tariq_abdul-wahad | tariq_abdul-wahad | 0 | None | Y | 00 | ||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4722 | 1627790 | Zizic, Ante | Ante Zizic | 0 | 2017 | 2019 | ante_zizic | ante_zizic | 0 | None | Y | 01 | ||||
4723 | 78647 | Zoet, Jim | Jim Zoet | 0 | 1982 | 1982 | HISTADD_jim_zoet | jim_zoet | 0 | None | Y | 00 | ||||
4724 | 78648 | Zopf, Bill | Bill Zopf | 0 | 1970 | 1970 | HISTADD_zip_zopf | bill_zopf | 0 | None | Y | 00 | ||||
4725 | 1627826 | Zubac, Ivica | Ivica Zubac | 1 | 2016 | 2021 | ivica_zubac | ivica_zubac | 1610612746 | LA | Clippers | LAC | clippers | clippers | Y | 01 |
4726 | 78650 | Zunic, Matt | Matt Zunic | 0 | 1948 | 1948 | HISTADD_matt_zunic | matt_zunic | 0 | None | Y | 00 |
4727 rows × 16 columns
all_players.rename(columns={'PERSON_ID':'PLAYER_ID'}, inplace=True)
def get_player_id(first, last):
for player in players:
if player['firstName'] == first and player['lastName'] == last:
return player['playerId']
return -1
get_player_id('Jonas', 'Valanciunas')
202685
all_players[all_players['PLAYER_ID'] == get_player_id('Jonas', 'Valanciunas')]
PLAYER_ID | DISPLAY_LAST_COMMA_FIRST | DISPLAY_FIRST_LAST | ROSTERSTATUS | FROM_YEAR | TO_YEAR | PLAYERCODE | PLAYER_SLUG | TEAM_ID | TEAM_CITY | TEAM_NAME | TEAM_ABBREVIATION | TEAM_CODE | TEAM_SLUG | GAMES_PLAYED_FLAG | OTHERLEAGUE_EXPERIENCE_CH | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4273 | 202685 | Valanciunas, Jonas | Jonas Valanciunas | 1 | 2012 | 2021 | jonas_valanciunas | jonas_valanciunas | 1610612740 | New Orleans | Pelicans | NOP | pelicans | pelicans | Y | 00 |
players_stats[players_stats['PLAYER_ID'] == 2557]
PLAYER_ID | SEASON_ID | LEAGUE_ID | TEAM_ID | TEAM_ABBREVIATION | PLAYER_AGE | GP | GS | MIN | FGM | ... | FT_PCT | OREB | DREB | REB | AST | STL | BLK | TOV | PF | PTS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13696 | 2557 | 2003-04 | 00 | 1610612760 | SEA | 23.0 | 69 | 6 | 1114.0 | 145 | ... | 0.823 | 35 | 73 | 108 | 163 | 52 | 7 | 80 | 105 | 382 |
13697 | 2557 | 2004-05 | 00 | 1610612760 | SEA | 24.0 | 82 | 82 | 2572.0 | 299 | ... | 0.883 | 55 | 149 | 204 | 483 | 94 | 23 | 149 | 185 | 824 |
13698 | 2557 | 2005-06 | 00 | 1610612760 | SEA | 25.0 | 79 | 77 | 2625.0 | 334 | ... | 0.877 | 47 | 188 | 235 | 550 | 123 | 22 | 162 | 178 | 910 |
13699 | 2557 | 2006-07 | 00 | 1610612760 | SEA | 26.0 | 71 | 58 | 2091.0 | 301 | ... | 0.805 | 28 | 136 | 164 | 368 | 83 | 19 | 156 | 181 | 779 |
13700 | 2557 | 2007-08 | 00 | 1610612760 | SEA | 27.0 | 61 | 5 | 1223.0 | 147 | ... | 0.857 | 13 | 81 | 94 | 241 | 37 | 14 | 80 | 120 | 393 |
13701 | 2557 | 2008-09 | 00 | 1610612749 | MIL | 28.0 | 72 | 50 | 2033.0 | 250 | ... | 0.869 | 33 | 185 | 218 | 365 | 92 | 16 | 128 | 200 | 688 |
13702 | 2557 | 2009-10 | 00 | 1610612749 | MIL | 29.0 | 82 | 0 | 1759.0 | 328 | ... | 0.907 | 24 | 119 | 143 | 324 | 54 | 7 | 106 | 182 | 852 |
13703 | 2557 | 2010-11 | 00 | 1610612750 | MIN | 30.0 | 71 | 66 | 2159.0 | 319 | ... | 0.883 | 37 | 162 | 199 | 384 | 89 | 10 | 158 | 146 | 840 |
13704 | 2557 | 2011-12 | 00 | 1610612750 | MIN | 31.0 | 53 | 53 | 1750.0 | 242 | ... | 0.891 | 19 | 122 | 141 | 252 | 56 | 16 | 96 | 136 | 639 |
13705 | 2557 | 2012-13 | 00 | 1610612750 | MIN | 32.0 | 82 | 82 | 2474.0 | 367 | ... | 0.848 | 41 | 165 | 206 | 311 | 82 | 15 | 130 | 187 | 939 |
13706 | 2557 | 2013-14 | 00 | 1610612749 | MIL | 33.0 | 36 | 12 | 763.0 | 84 | ... | 0.684 | 15 | 47 | 62 | 122 | 23 | 2 | 46 | 55 | 206 |
13707 | 2557 | 2013-14 | 00 | 1610612766 | CHA | 33.0 | 25 | 2 | 378.0 | 42 | ... | 0.571 | 8 | 27 | 35 | 54 | 9 | 6 | 20 | 37 | 100 |
13708 | 2557 | 2013-14 | 00 | 0 | TOT | 33.0 | 61 | 14 | 1141.0 | 126 | ... | 0.654 | 23 | 74 | 97 | 176 | 32 | 8 | 66 | 92 | 306 |
13709 | 2557 | 2014-15 | 00 | 1610612753 | ORL | 34.0 | 47 | 0 | 683.0 | 75 | ... | 0.857 | 8 | 60 | 68 | 96 | 20 | 4 | 38 | 67 | 188 |
14 rows × 27 columns
all_players.columns
Index(['PLAYER_ID', 'DISPLAY_LAST_COMMA_FIRST', 'DISPLAY_FIRST_LAST', 'ROSTERSTATUS', 'FROM_YEAR', 'TO_YEAR', 'PLAYERCODE', 'PLAYER_SLUG', 'TEAM_ID', 'TEAM_CITY', 'TEAM_NAME', 'TEAM_ABBREVIATION', 'TEAM_CODE', 'TEAM_SLUG', 'GAMES_PLAYED_FLAG', 'OTHERLEAGUE_EXPERIENCE_CH'], dtype='object')
all_players_columns = ['DISPLAY_LAST_COMMA_FIRST',
'ROSTERSTATUS', 'FROM_YEAR', 'TO_YEAR', 'PLAYERCODE', 'PLAYER_SLUG',
'TEAM_ID', 'TEAM_ABBREVIATION', 'TEAM_CODE',
'TEAM_SLUG', 'GAMES_PLAYED_FLAG', 'OTHERLEAGUE_EXPERIENCE_CH']
all_players_to_merge = all_players.drop(all_players_columns, axis=1)
all_players_to_merge
PLAYER_ID | DISPLAY_FIRST_LAST | TEAM_CITY | TEAM_NAME | |
---|---|---|---|---|
0 | 76001 | Alaa Abdelnaby | ||
1 | 76002 | Zaid Abdul-Aziz | ||
2 | 76003 | Kareem Abdul-Jabbar | ||
3 | 51 | Mahmoud Abdul-Rauf | ||
4 | 1505 | Tariq Abdul-Wahad | ||
... | ... | ... | ... | ... |
4722 | 1627790 | Ante Zizic | ||
4723 | 78647 | Jim Zoet | ||
4724 | 78648 | Bill Zopf | ||
4725 | 1627826 | Ivica Zubac | LA | Clippers |
4726 | 78650 | Matt Zunic |
4727 rows × 4 columns
players_stats = pd.merge(players_stats, all_players_to_merge, on=['PLAYER_ID'])
players_stats
PLAYER_ID | SEASON_ID | LEAGUE_ID | TEAM_ID | TEAM_ABBREVIATION | PLAYER_AGE | GP | GS | MIN | FGM | ... | BLK | TOV | PF | PTS | DISPLAY_FIRST_LAST_x | TEAM_CITY_x | TEAM_NAME_x | DISPLAY_FIRST_LAST_y | TEAM_CITY_y | TEAM_NAME_y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 51 | 1990-91 | 00 | 1610612743 | DEN | 22.0 | 67 | 19 | 1505.0 | 417 | ... | 4 | 110 | 149 | 942 | Mahmoud Abdul-Rauf | Mahmoud Abdul-Rauf | ||||
1 | 51 | 1991-92 | 00 | 1610612743 | DEN | 23.0 | 81 | 11 | 1538.0 | 356 | ... | 4 | 117 | 130 | 837 | Mahmoud Abdul-Rauf | Mahmoud Abdul-Rauf | ||||
2 | 51 | 1992-93 | 00 | 1610612743 | DEN | 24.0 | 81 | 81 | 2710.0 | 633 | ... | 8 | 187 | 179 | 1553 | Mahmoud Abdul-Rauf | Mahmoud Abdul-Rauf | ||||
3 | 51 | 1993-94 | 00 | 1610612743 | DEN | 25.0 | 80 | 78 | 2617.0 | 588 | ... | 10 | 151 | 150 | 1437 | Mahmoud Abdul-Rauf | Mahmoud Abdul-Rauf | ||||
4 | 51 | 1994-95 | 00 | 1610612743 | DEN | 26.0 | 73 | 43 | 2082.0 | 472 | ... | 9 | 119 | 126 | 1165 | Mahmoud Abdul-Rauf | Mahmoud Abdul-Rauf | ||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
17969 | 1627826 | 2018-19 | 00 | 1610612746 | LAC | 22.0 | 26 | 25 | 524.0 | 100 | ... | 24 | 37 | 64 | 244 | Ivica Zubac | LA | Clippers | Ivica Zubac | LA | Clippers |
17970 | 1627826 | 2018-19 | 00 | 0 | TOT | 22.0 | 59 | 37 | 1039.0 | 212 | ... | 51 | 70 | 137 | 525 | Ivica Zubac | LA | Clippers | Ivica Zubac | LA | Clippers |
17971 | 1627826 | 2019-20 | 00 | 1610612746 | LAC | 23.0 | 72 | 70 | 1326.0 | 236 | ... | 66 | 61 | 168 | 596 | Ivica Zubac | LA | Clippers | Ivica Zubac | LA | Clippers |
17972 | 1627826 | 2020-21 | 00 | 1610612746 | LAC | 24.0 | 72 | 33 | 1609.0 | 257 | ... | 62 | 81 | 187 | 650 | Ivica Zubac | LA | Clippers | Ivica Zubac | LA | Clippers |
17973 | 1627826 | 2021-22 | 00 | 1610612746 | LAC | 24.0 | 35 | 35 | 876.0 | 126 | ... | 41 | 48 | 80 | 336 | Ivica Zubac | LA | Clippers | Ivica Zubac | LA | Clippers |
17974 rows × 33 columns
players_stats['SEASON_ID'] = players_stats['SEASON_ID'].map(lambda x: int(x.split("-",1)[0]))
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_21052/3326492598.py in <module> ----> 1 players_stats['SEASON_ID'] = players_stats['SEASON_ID'].map(lambda x: int(x.split("-",1)[0])) 2 players_stats ~\anaconda3\lib\site-packages\pandas\core\series.py in map(self, arg, na_action) 4159 dtype: object 4160 """ -> 4161 new_values = super()._map_values(arg, na_action=na_action) 4162 return self._constructor(new_values, index=self.index).__finalize__( 4163 self, method="map" ~\anaconda3\lib\site-packages\pandas\core\base.py in _map_values(self, mapper, na_action) 868 869 # mapper is a function --> 870 new_values = map_f(values, mapper) 871 872 return new_values ~\anaconda3\lib\site-packages\pandas\_libs\lib.pyx in pandas._libs.lib.map_infer() ~\AppData\Local\Temp/ipykernel_21052/3326492598.py in <lambda>(x) ----> 1 players_stats['SEASON_ID'] = players_stats['SEASON_ID'].map(lambda x: int(x.split("-",1)[0])) 2 players_stats AttributeError: 'int' object has no attribute 'split'
players_stats
PLAYER_ID | SEASON_ID | LEAGUE_ID | TEAM_ID | TEAM_ABBREVIATION | PLAYER_AGE | GP | GS | MIN | FGM | ... | BLK | TOV | PF | PTS | DISPLAY_FIRST_LAST_x | TEAM_CITY_x | TEAM_NAME_x | DISPLAY_FIRST_LAST_y | TEAM_CITY_y | TEAM_NAME_y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 51 | 1990 | 00 | 1610612743 | DEN | 22.0 | 67 | 19 | 1505.0 | 417 | ... | 4 | 110 | 149 | 942 | Mahmoud Abdul-Rauf | Mahmoud Abdul-Rauf | ||||
1 | 51 | 1991 | 00 | 1610612743 | DEN | 23.0 | 81 | 11 | 1538.0 | 356 | ... | 4 | 117 | 130 | 837 | Mahmoud Abdul-Rauf | Mahmoud Abdul-Rauf | ||||
2 | 51 | 1992 | 00 | 1610612743 | DEN | 24.0 | 81 | 81 | 2710.0 | 633 | ... | 8 | 187 | 179 | 1553 | Mahmoud Abdul-Rauf | Mahmoud Abdul-Rauf | ||||
3 | 51 | 1993 | 00 | 1610612743 | DEN | 25.0 | 80 | 78 | 2617.0 | 588 | ... | 10 | 151 | 150 | 1437 | Mahmoud Abdul-Rauf | Mahmoud Abdul-Rauf | ||||
4 | 51 | 1994 | 00 | 1610612743 | DEN | 26.0 | 73 | 43 | 2082.0 | 472 | ... | 9 | 119 | 126 | 1165 | Mahmoud Abdul-Rauf | Mahmoud Abdul-Rauf | ||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
17969 | 1627826 | 2018 | 00 | 1610612746 | LAC | 22.0 | 26 | 25 | 524.0 | 100 | ... | 24 | 37 | 64 | 244 | Ivica Zubac | LA | Clippers | Ivica Zubac | LA | Clippers |
17970 | 1627826 | 2018 | 00 | 0 | TOT | 22.0 | 59 | 37 | 1039.0 | 212 | ... | 51 | 70 | 137 | 525 | Ivica Zubac | LA | Clippers | Ivica Zubac | LA | Clippers |
17971 | 1627826 | 2019 | 00 | 1610612746 | LAC | 23.0 | 72 | 70 | 1326.0 | 236 | ... | 66 | 61 | 168 | 596 | Ivica Zubac | LA | Clippers | Ivica Zubac | LA | Clippers |
17972 | 1627826 | 2020 | 00 | 1610612746 | LAC | 24.0 | 72 | 33 | 1609.0 | 257 | ... | 62 | 81 | 187 | 650 | Ivica Zubac | LA | Clippers | Ivica Zubac | LA | Clippers |
17973 | 1627826 | 2021 | 00 | 1610612746 | LAC | 24.0 | 35 | 35 | 876.0 | 126 | ... | 41 | 48 | 80 | 336 | Ivica Zubac | LA | Clippers | Ivica Zubac | LA | Clippers |
17974 rows × 33 columns
#sezonai po 2012-2021
players_stats_last10 = player_stats[player_stats['SEASON_ID'] > 2011]
#sezonai 2002 - 2011
players_stats_before10 = player_stats[(player_stats['SEASON_ID'] > 2001) & (player_stats['SEASON_ID'] < 2012)]
players_stats_before10
PLAYER_ID | SEASON_ID | LEAGUE_ID | TEAM_ID | TEAM_ABBREVIATION | PLAYER_AGE | GP | GS | MIN | FGM | ... | FT_PCT | OREB | DREB | REB | AST | STL | BLK | TOV | PF | PTS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
18 | 1505 | 2002 | 00 | 1610612742 | DAL | 28.0 | 14 | 0 | 204.0 | 27 | ... | 0.500 | 14 | 26 | 40 | 21 | 6 | 3 | 7 | 26 | 57 |
25 | 949 | 2002 | 00 | 1610612737 | ATL | 26.0 | 81 | 81 | 3087.0 | 566 | ... | 0.841 | 175 | 502 | 677 | 242 | 87 | 38 | 212 | 240 | 1608 |
26 | 949 | 2003 | 00 | 1610612737 | ATL | 27.0 | 53 | 53 | 1955.0 | 383 | ... | 0.880 | 141 | 354 | 495 | 127 | 44 | 19 | 131 | 147 | 1065 |
27 | 949 | 2003 | 00 | 1610612757 | POR | 27.0 | 32 | 3 | 729.0 | 118 | ... | 0.832 | 48 | 96 | 144 | 47 | 24 | 18 | 53 | 75 | 319 |
28 | 949 | 2003 | 00 | 0 | TOT | 27.0 | 85 | 56 | 2684.0 | 501 | ... | 0.869 | 189 | 450 | 639 | 174 | 68 | 37 | 184 | 222 | 1384 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
17948 | 1917 | 2003 | 00 | 1610612746 | LAC | 26.0 | 2 | 0 | 9.0 | 0 | ... | 1.000 | 0 | 4 | 4 | 0 | 0 | 1 | 0 | 0 | 4 |
17949 | 1917 | 2003 | 00 | 1610612748 | MIA | 26.0 | 14 | 0 | 106.0 | 17 | ... | 0.833 | 4 | 10 | 14 | 2 | 3 | 4 | 6 | 13 | 43 |
17950 | 1917 | 2003 | 00 | 0 | TOT | 26.0 | 16 | 0 | 115.0 | 17 | ... | 0.900 | 4 | 14 | 18 | 2 | 3 | 5 | 6 | 13 | 47 |
17951 | 1917 | 2004 | 00 | 1610612748 | MIA | 27.0 | 20 | 0 | 91.0 | 17 | ... | 0.583 | 6 | 12 | 18 | 5 | 3 | 2 | 5 | 9 | 43 |
17959 | 2583 | 2005 | 00 | 1610612751 | NJN | 24.0 | 2 | 0 | 32.0 | 2 | ... | 0.000 | 1 | 3 | 4 | 7 | 0 | 0 | 4 | 4 | 4 |
5663 rows × 27 columns
players_stats_last10['FG3A'].sum()
683631
players_stats_before10['FG3A'].sum()
447277
#Jei užtektų trumpo atsakymo, tai taip
players_stats_last10['FG3A'].sum() > players_stats_before10['FG3A'].sum()
True
#Tačiau svarbu pamatyti tendencijas, kaip keitėsi tritaškių kiekis per tuos metus
# Toliau tvarkausi duomenų paketą...
players_stats.drop(['LEAGUE_ID', 'DISPLAY_FIRST_LAST_x', 'TEAM_CITY_x', 'TEAM_NAME_x'], axis=1, inplace=True)
players_stats.rename(columns={'DISPLAY_FIRST_LAST_y':'PLAYER_NAME'}, inplace=True)
players_stats = players_stats.set_index(['PLAYER_ID'])
# Pastebėjau, kad yra TOT reikšmės, kurios iškreipia duomenis - TOTAL per season turbūt rodo.
players_stats = players_stats[players_stats.TEAM_ABBREVIATION != 'TOT']
players_stats[players_stats['TEAM_ABBREVIATION'] == 'TOT']
SEASON_ID | TEAM_ID | TEAM_ABBREVIATION | PLAYER_AGE | GP | GS | MIN | FGM | FGA | FG_PCT | ... | AST | STL | BLK | TOV | PF | PTS | PLAYER_NAME | TEAM_CITY_y | TEAM_NAME_y | FG3A_per_game | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PLAYER_ID |
0 rows × 29 columns
players_stats = players_stats.dropna()
#pasidariau susiaurinta variantą, nuo 2000 m.
players_stats1 = players_stats[players_stats['SEASON_ID'] > 2000]
data1 = pd.DataFrame(players_stats1.groupby('SEASON_ID')['FG3A'].sum())
data1
FG3A | |
---|---|
SEASON_ID | |
2001 | 35074 |
2002 | 34912 |
2003 | 35492 |
2004 | 38748 |
2005 | 39313 |
2006 | 41672 |
2007 | 44544 |
2008 | 44583 |
2009 | 44622 |
2010 | 44313 |
2011 | 36395 |
2012 | 49067 |
2013 | 52974 |
2014 | 55137 |
2015 | 59241 |
2016 | 66421 |
2017 | 71339 |
2018 | 78742 |
2019 | 72252 |
2020 | 74822 |
2021 | 38286 |
data1.plot.bar()
plt.title('Išmesti tritaškiai')
Text(0.5, 1.0, 'Išmesti tritaškiai')
Hipotezė patvirtinta
# Matome yra didelių nuokrypių, nes dar paskutinis sezonas nepasibaiges, o ir ne visus sezonus vienodas varzybu skaicius buvo
players_stats1 = players_stats1[(players_stats1['FG3A'] > 100)] #kad nebūtų labai kraštutinių duomenų
players_stats1.groupby('PLAYER_NAME')['FG3_PCT'].mean().sort_values(ascending=False).head(10)
PLAYER_NAME Fred Hoiberg 0.462500 Hubert Davis 0.452000 Pau Gasol 0.448000 Jason Kapono 0.441000 Seth Curry 0.439167 Wesley Person 0.438500 Joe Harris 0.435400 Steve Novak 0.434750 Michael Porter Jr. 0.433500 Stephen Curry 0.433333 Name: FG3_PCT, dtype: float64
data2= pd.DataFrame(players_stats1.groupby('PLAYER_NAME')['FG3_PCT'].mean().sort_values(ascending=False).head(10))
data2
FG3_PCT | |
---|---|
PLAYER_NAME | |
Fred Hoiberg | 0.462500 |
Hubert Davis | 0.452000 |
Pau Gasol | 0.448000 |
Jason Kapono | 0.441000 |
Seth Curry | 0.439167 |
Wesley Person | 0.438500 |
Joe Harris | 0.435400 |
Steve Novak | 0.434750 |
Michael Porter Jr. | 0.433500 |
Stephen Curry | 0.433333 |
data2.plot.bar()
plt.title('Tritaškių pataikymas')
Text(0.5, 1.0, 'Tritaškių pataikymas')
Hipotezė paneigta
players_stats2 = players_stats[(players_stats['FG3A'] > 100)]
data3 = pd.DataFrame(players_stats2.groupby('PLAYER_NAME')['FG3M'].sum().sort_values(ascending=False).head(10))
data3
FG3M | |
---|---|
PLAYER_NAME | |
Stephen Curry | 2998 |
Ray Allen | 2973 |
Reggie Miller | 2560 |
James Harden | 2495 |
Kyle Korver | 2425 |
Jason Terry | 2282 |
Vince Carter | 2225 |
Jamal Crawford | 2159 |
Damian Lillard | 2143 |
Paul Pierce | 2128 |
data3.plot.bar()
plt.title('Įmesti tritaškiai')
Text(0.5, 1.0, 'Įmesti tritaškiai')
players_stats2['FG3M'].max()
402
players_stats2[players_stats2['FG3M'] == 402] #taip pat ir per vieną sezoną daugiausiai įmetė 3pt
SEASON_ID | TEAM_ID | TEAM_ABBREVIATION | PLAYER_AGE | GP | GS | MIN | FGM | FGA | FG_PCT | ... | AST | STL | BLK | TOV | PF | PTS | PLAYER_NAME | TEAM_CITY_y | TEAM_NAME_y | FG3A_per_game | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PLAYER_ID | |||||||||||||||||||||
201939 | 2015 | 1610612744 | GSW | 28.0 | 79 | 79 | 2700.0 | 805 | 1598 | 0.504 | ... | 527 | 169 | 15 | 262 | 161 | 2375 | Stephen Curry | Golden State | Warriors | 11.21519 |
1 rows × 29 columns
# Curry dar praeitų metų gale pralenkė R. Allen ir tapo tritaškių rekordininku
data4 = pd.DataFrame(players_stats2.groupby('PLAYER_NAME')['FG3A'].sum().sort_values(ascending=False).head(10))
data4
FG3A | |
---|---|
PLAYER_NAME | |
Ray Allen | 7429 |
Stephen Curry | 6936 |
James Harden | 6879 |
Reggie Miller | 6486 |
Jamal Crawford | 6242 |
Jason Terry | 6010 |
Vince Carter | 5965 |
LeBron James | 5944 |
Paul Pierce | 5773 |
Damian Lillard | 5752 |
data4.plot.bar()
plt.title('Išmesti tritaškiai')
Text(0.5, 1.0, 'Išmesti tritaškiai')
Taigi, hipotezė nepasitvirtino - daugiausiai tritaškių yra išmetęs R. Allen
players_stats
SEASON_ID | TEAM_ID | TEAM_ABBREVIATION | PLAYER_AGE | GP | GS | MIN | FGM | FGA | FG_PCT | ... | AST | STL | BLK | TOV | PF | PTS | PLAYER_NAME | TEAM_CITY_y | TEAM_NAME_y | FG3A_per_game | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PLAYER_ID | |||||||||||||||||||||
51 | 1990 | 1610612743 | DEN | 22.0 | 67 | 19 | 1505.0 | 417 | 1009 | 0.413 | ... | 206 | 55 | 4 | 110 | 149 | 942 | Mahmoud Abdul-Rauf | 1.492537 | ||
51 | 1991 | 1610612743 | DEN | 23.0 | 81 | 11 | 1538.0 | 356 | 845 | 0.421 | ... | 192 | 44 | 4 | 117 | 130 | 837 | Mahmoud Abdul-Rauf | 1.160494 | ||
51 | 1992 | 1610612743 | DEN | 24.0 | 81 | 81 | 2710.0 | 633 | 1407 | 0.450 | ... | 344 | 84 | 8 | 187 | 179 | 1553 | Mahmoud Abdul-Rauf | 2.432099 | ||
51 | 1993 | 1610612743 | DEN | 25.0 | 80 | 78 | 2617.0 | 588 | 1279 | 0.460 | ... | 362 | 82 | 10 | 151 | 150 | 1437 | Mahmoud Abdul-Rauf | 1.6625 | ||
51 | 1994 | 1610612743 | DEN | 26.0 | 73 | 43 | 2082.0 | 472 | 1005 | 0.470 | ... | 263 | 77 | 9 | 119 | 126 | 1165 | Mahmoud Abdul-Rauf | 2.945205 | ||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1627826 | 2018 | 1610612747 | LAL | 22.0 | 33 | 12 | 516.0 | 112 | 193 | 0.580 | ... | 25 | 4 | 27 | 33 | 73 | 281 | Ivica Zubac | LA | Clippers | 0.0 |
1627826 | 2018 | 1610612746 | LAC | 22.0 | 26 | 25 | 524.0 | 100 | 186 | 0.538 | ... | 38 | 10 | 24 | 37 | 64 | 244 | Ivica Zubac | LA | Clippers | 0.0 |
1627826 | 2019 | 1610612746 | LAC | 23.0 | 72 | 70 | 1326.0 | 236 | 385 | 0.613 | ... | 82 | 16 | 66 | 61 | 168 | 596 | Ivica Zubac | LA | Clippers | 0.027778 |
1627826 | 2020 | 1610612746 | LAC | 24.0 | 72 | 33 | 1609.0 | 257 | 394 | 0.652 | ... | 90 | 24 | 62 | 81 | 187 | 650 | Ivica Zubac | LA | Clippers | 0.055556 |
1627826 | 2021 | 1610612746 | LAC | 24.0 | 35 | 35 | 876.0 | 126 | 192 | 0.656 | ... | 38 | 19 | 41 | 48 | 80 | 336 | Ivica Zubac | LA | Clippers | 0.0 |
16393 rows × 29 columns
players_stats_test = players_stats[(players_stats['FG3A'] > 100)]
players_stats_test.plot.scatter(x='PLAYER_AGE', y='FG3_PCT')
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2D array with a single row if you intend to specify the same RGB or RGBA value for all points.
<AxesSubplot:xlabel='PLAYER_AGE', ylabel='FG3_PCT'>
X = players_stats_test[['PLAYER_AGE']]
X
PLAYER_AGE | |
---|---|
PLAYER_ID | |
51 | 24.0 |
51 | 25.0 |
51 | 26.0 |
51 | 27.0 |
51 | 28.0 |
... | ... |
1629027 | 21.0 |
1629027 | 22.0 |
1629027 | 23.0 |
1917 | 24.0 |
1627835 | 24.0 |
4857 rows × 1 columns
y = players_stats_test['FG3_PCT']
model = LinearRegression()
model
LinearRegression()
model.fit(X, y)
LinearRegression()
prediction = model.predict(X)
prediction
array([0.35420242, 0.35598891, 0.35777539, ..., 0.35241594, 0.35420242, 0.35420242])
players_stats_test['prediction'] = prediction
players_stats_test[['PLAYER_AGE', 'FG3_PCT', 'prediction']]
C:\Users\Vytis\AppData\Local\Temp/ipykernel_21052/1002346078.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy players_stats_test['prediction'] = prediction
PLAYER_AGE | FG3_PCT | prediction | |
---|---|---|---|
PLAYER_ID | |||
51 | 24.0 | 0.355 | 0.354202 |
51 | 25.0 | 0.316 | 0.355989 |
51 | 26.0 | 0.386 | 0.357775 |
51 | 27.0 | 0.392 | 0.359562 |
51 | 28.0 | 0.382 | 0.361348 |
... | ... | ... | ... |
1629027 | 21.0 | 0.361 | 0.348843 |
1629027 | 22.0 | 0.343 | 0.350629 |
1629027 | 23.0 | 0.371 | 0.352416 |
1917 | 24.0 | 0.414 | 0.354202 |
1627835 | 24.0 | 0.336 | 0.354202 |
4857 rows × 3 columns
plt.scatter(X, y)
plt.plot(X, prediction, 'r')
[<matplotlib.lines.Line2D at 0x204e9458640>]
r_squared = model.score(X, y)
r_squared
0.026042196049113397
#labai prastai?
steph = players_stats[players_stats['PLAYER_NAME'] == 'Stephen Curry'][['PLAYER_AGE', 'FG_PCT']]
steph
PLAYER_AGE | FG_PCT | |
---|---|---|
PLAYER_ID | ||
201939 | 22.0 | 0.462 |
201939 | 23.0 | 0.480 |
201939 | 24.0 | 0.490 |
201939 | 25.0 | 0.451 |
201939 | 26.0 | 0.471 |
201939 | 27.0 | 0.487 |
201939 | 28.0 | 0.504 |
201939 | 29.0 | 0.468 |
201939 | 30.0 | 0.495 |
201939 | 31.0 | 0.472 |
201939 | 32.0 | 0.402 |
201939 | 33.0 | 0.482 |
201939 | 33.0 | 0.434 |
steph.plot.scatter(x='PLAYER_AGE', y='FG_PCT')
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2D array with a single row if you intend to specify the same RGB or RGBA value for all points.
<AxesSubplot:xlabel='PLAYER_AGE', ylabel='FG_PCT'>
from sklearn.linear_model import LinearRegression
X = steph[['PLAYER_AGE']]
X
PLAYER_AGE | |
---|---|
PLAYER_ID | |
201939 | 22.0 |
201939 | 23.0 |
201939 | 24.0 |
201939 | 25.0 |
201939 | 26.0 |
201939 | 27.0 |
201939 | 28.0 |
201939 | 29.0 |
201939 | 30.0 |
201939 | 31.0 |
201939 | 32.0 |
201939 | 33.0 |
201939 | 33.0 |
y = steph['FG_PCT']
y
PLAYER_ID 201939 0.462 201939 0.480 201939 0.490 201939 0.451 201939 0.471 201939 0.487 201939 0.504 201939 0.468 201939 0.495 201939 0.472 201939 0.402 201939 0.482 201939 0.434 Name: FG_PCT, dtype: float64
model = LinearRegression()
model
LinearRegression()
model.fit(X, y)
LinearRegression()
model.predict([[50]])
C:\Users\Vytis\anaconda3\lib\site-packages\sklearn\base.py:450: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names warnings.warn(
array([0.41961746])
prediction = model.predict(X)
prediction
array([0.48234653, 0.48010621, 0.47786589, 0.47562556, 0.47338524, 0.47114491, 0.46890459, 0.46666427, 0.46442394, 0.46218362, 0.45994329, 0.45770297, 0.45770297])
steph['prediction'] = prediction
steph
PLAYER_AGE | FG_PCT | prediction | |
---|---|---|---|
PLAYER_ID | |||
201939 | 22.0 | 0.462 | 0.482347 |
201939 | 23.0 | 0.480 | 0.480106 |
201939 | 24.0 | 0.490 | 0.477866 |
201939 | 25.0 | 0.451 | 0.475626 |
201939 | 26.0 | 0.471 | 0.473385 |
201939 | 27.0 | 0.487 | 0.471145 |
201939 | 28.0 | 0.504 | 0.468905 |
201939 | 29.0 | 0.468 | 0.466664 |
201939 | 30.0 | 0.495 | 0.464424 |
201939 | 31.0 | 0.472 | 0.462184 |
201939 | 32.0 | 0.402 | 0.459943 |
201939 | 33.0 | 0.482 | 0.457703 |
201939 | 33.0 | 0.434 | 0.457703 |
plt.scatter(X, y)
plt.plot(X, prediction, 'r')
[<matplotlib.lines.Line2D at 0x204e80accd0>]
r_squared = model.score(X, y)
r_squared
0.09472000198151331
#labai netikslu, tai reiškia nepriklauso
steph.plot.scatter('PLAYER_AGE', 'FG_PCT')
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with *x* & *y*. Please use the *color* keyword-argument or provide a 2D array with a single row if you intend to specify the same RGB or RGBA value for all points.
<AxesSubplot:xlabel='PLAYER_AGE', ylabel='FG_PCT'>
players_stats1['FG3A_per_game'] = players_stats1['FG3A']/players_stats1['GP']
C:\Users\Vytis\AppData\Local\Temp/ipykernel_21052/205737034.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy players_stats1['FG3A_per_game'] = players_stats1['FG3A']/players_stats1['GP']
players_stats['FG3A_per_game']
PLAYER_ID 203518 3.632353 203518 2.946667 203518 4.096774 1630173 0.016393 1630173 1.208333 ... 1627826 0.0 1627826 0.0 1627826 0.027778 1627826 0.055556 1627826 0.0 Name: FG3A_per_game, Length: 6888, dtype: object
players_stats.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 17974 entries, 51 to 1627826 Data columns (total 28 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 SEASON_ID 17974 non-null int64 1 TEAM_ID 17974 non-null object 2 TEAM_ABBREVIATION 17974 non-null object 3 PLAYER_AGE 17974 non-null float64 4 GP 17974 non-null object 5 GS 17974 non-null object 6 MIN 17974 non-null float64 7 FGM 17974 non-null object 8 FGA 17974 non-null object 9 FG_PCT 17974 non-null float64 10 FG3M 17966 non-null object 11 FG3A 17966 non-null object 12 FG3_PCT 17966 non-null float64 13 FTM 17974 non-null object 14 FTA 17974 non-null object 15 FT_PCT 17974 non-null float64 16 OREB 17974 non-null object 17 DREB 17974 non-null object 18 REB 17974 non-null object 19 AST 17974 non-null object 20 STL 17974 non-null object 21 BLK 17974 non-null object 22 TOV 17973 non-null object 23 PF 17974 non-null object 24 PTS 17974 non-null object 25 PLAYER_NAME 17974 non-null object 26 TEAM_CITY_y 17974 non-null object 27 TEAM_NAME_y 17974 non-null object dtypes: float64(5), int64(1), object(22) memory usage: 4.5+ MB
players_stats['FG3A'].dropna().astype(int)
PLAYER_ID 51 100 51 94 51 197 51 133 51 215 ... 1627826 0 1627826 0 1627826 2 1627826 4 1627826 0 Name: FG3A, Length: 17966, dtype: int32
players_stats['GP'].astype(int)
PLAYER_ID 51 67 51 81 51 81 51 80 51 73 .. 1627826 26 1627826 59 1627826 72 1627826 72 1627826 35 Name: GP, Length: 17974, dtype: int32
players_stats['FG3A_per_game'] = players_stats['FG3A']/players_stats['GP']
players_stats['FG3A_per_game']
PLAYER_ID 51 1.492537 51 1.160494 51 2.432099 51 1.6625 51 2.945205 ... 1627826 0.0 1627826 0.0 1627826 0.027778 1627826 0.055556 1627826 0.0 Name: FG3A_per_game, Length: 17974, dtype: object
players_stats
SEASON_ID | TEAM_ID | TEAM_ABBREVIATION | PLAYER_AGE | GP | GS | MIN | FGM | FGA | FG_PCT | ... | AST | STL | BLK | TOV | PF | PTS | PLAYER_NAME | TEAM_CITY_y | TEAM_NAME_y | FG3A_per_game | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PLAYER_ID | |||||||||||||||||||||
51 | 1990 | 1610612743 | DEN | 22.0 | 67 | 19 | 1505.0 | 417 | 1009 | 0.413 | ... | 206 | 55 | 4 | 110 | 149 | 942 | Mahmoud Abdul-Rauf | 1.492537 | ||
51 | 1991 | 1610612743 | DEN | 23.0 | 81 | 11 | 1538.0 | 356 | 845 | 0.421 | ... | 192 | 44 | 4 | 117 | 130 | 837 | Mahmoud Abdul-Rauf | 1.160494 | ||
51 | 1992 | 1610612743 | DEN | 24.0 | 81 | 81 | 2710.0 | 633 | 1407 | 0.450 | ... | 344 | 84 | 8 | 187 | 179 | 1553 | Mahmoud Abdul-Rauf | 2.432099 | ||
51 | 1993 | 1610612743 | DEN | 25.0 | 80 | 78 | 2617.0 | 588 | 1279 | 0.460 | ... | 362 | 82 | 10 | 151 | 150 | 1437 | Mahmoud Abdul-Rauf | 1.6625 | ||
51 | 1994 | 1610612743 | DEN | 26.0 | 73 | 43 | 2082.0 | 472 | 1005 | 0.470 | ... | 263 | 77 | 9 | 119 | 126 | 1165 | Mahmoud Abdul-Rauf | 2.945205 | ||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1627826 | 2018 | 1610612746 | LAC | 22.0 | 26 | 25 | 524.0 | 100 | 186 | 0.538 | ... | 38 | 10 | 24 | 37 | 64 | 244 | Ivica Zubac | LA | Clippers | 0.0 |
1627826 | 2018 | 0 | TOT | 22.0 | 59 | 37 | 1039.0 | 212 | 379 | 0.559 | ... | 63 | 14 | 51 | 70 | 137 | 525 | Ivica Zubac | LA | Clippers | 0.0 |
1627826 | 2019 | 1610612746 | LAC | 23.0 | 72 | 70 | 1326.0 | 236 | 385 | 0.613 | ... | 82 | 16 | 66 | 61 | 168 | 596 | Ivica Zubac | LA | Clippers | 0.027778 |
1627826 | 2020 | 1610612746 | LAC | 24.0 | 72 | 33 | 1609.0 | 257 | 394 | 0.652 | ... | 90 | 24 | 62 | 81 | 187 | 650 | Ivica Zubac | LA | Clippers | 0.055556 |
1627826 | 2021 | 1610612746 | LAC | 24.0 | 35 | 35 | 876.0 | 126 | 192 | 0.656 | ... | 38 | 19 | 41 | 48 | 80 | 336 | Ivica Zubac | LA | Clippers | 0.0 |
17974 rows × 29 columns
players_stats['SEASON_ID'].min()
1976
players_stats.to_csv('C:\\Users\\Vytis\\Desktop\\Perlis UŽT\\Duomenu analitika\\Python\\Baigiamasis darbas\\players_stats.csv')
from nba_api.stats.endpoints import shotchartdetail
# Yra sukurti pagalbiniai failai, kurie padeda surasti komandos ir žaidėjo ID
teams = json.loads(requests.get('https://raw.githubusercontent.com/bttmly/nba/master/data/teams.json').text)
players = json.loads(requests.get('https://raw.githubusercontent.com/bttmly/nba/master/data/players.json').text)
# Susikuriame funkcijas kaip gauti ID pagal komandos/žaidėjo pavadinimą
# Komandos:
def get_team_id(team):
for team in teams:
if team['teamName'] == team:
return team['teamId']
return -1
# Žaidėjo:
def get_player_id(first, last):
for player in players:
if player['firstName'] == first and player['lastName'] == last:
return player['playerId']
return -1
#surandame JV ID
get_player_id('Jonas', 'Valanciunas')
202685
#surandame Sabo ID
get_player_id('Domantas', 'Sabonis')
1627734
shot_json = shotchartdetail.ShotChartDetail(
team_id = get_team_id('New Orleans Pelicans'),
player_id = get_player_id('Jonas', 'Valanciunas'),
context_measure_simple = 'PTS',
season_nullable = '2020-21',
season_type_all_star = 'Regular Season')
--------------------------------------------------------------------------- JSONDecodeError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_21052/1692337837.py in <module> ----> 1 shot_json = shotchartdetail.ShotChartDetail( 2 team_id = get_team_id('New Orleans Pelicans'), 3 player_id = get_player_id('Jonas', 'Valanciunas'), 4 context_measure_simple = 'PTS', 5 season_nullable = '2020-21', ~\anaconda3\lib\site-packages\nba_api\stats\endpoints\shotchartdetail.py in __init__(self, team_id, player_id, context_measure_simple, last_n_games, league_id, month, opponent_team_id, period, season_type_all_star, ahead_behind_nullable, clutch_time_nullable, context_filter_nullable, date_from_nullable, date_to_nullable, end_period_nullable, end_range_nullable, game_id_nullable, game_segment_nullable, location_nullable, outcome_nullable, player_position_nullable, point_diff_nullable, position_nullable, range_type_nullable, rookie_year_nullable, season_nullable, season_segment_nullable, start_period_nullable, start_range_nullable, vs_conference_nullable, vs_division_nullable, proxy, headers, timeout, get_request) 88 } 89 if get_request: ---> 90 self.get_request() 91 92 def get_request(self): ~\anaconda3\lib\site-packages\nba_api\stats\endpoints\shotchartdetail.py in get_request(self) 98 timeout=self.timeout, 99 ) --> 100 self.load_response() 101 102 def load_response(self): ~\anaconda3\lib\site-packages\nba_api\stats\endpoints\shotchartdetail.py in load_response(self) 101 102 def load_response(self): --> 103 data_sets = self.nba_response.get_data_sets() 104 self.data_sets = [Endpoint.DataSet(data=data_set) for data_set_name, data_set in data_sets.items()] 105 self.league_averages = Endpoint.DataSet(data=data_sets['LeagueAverages']) ~\anaconda3\lib\site-packages\nba_api\stats\library\http.py in get_data_sets(self) 81 82 def get_data_sets(self): ---> 83 raw_dict = self.get_dict() 84 if 'resultSets' in raw_dict: 85 results = raw_dict['resultSets'] ~\anaconda3\lib\site-packages\nba_api\library\http.py in get_dict(self) 39 40 def get_dict(self): ---> 41 return json.loads(self._response) 42 43 def get_json(self): ~\anaconda3\lib\json\__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw) 344 parse_int is None and parse_float is None and 345 parse_constant is None and object_pairs_hook is None and not kw): --> 346 return _default_decoder.decode(s) 347 if cls is None: 348 cls = JSONDecoder ~\anaconda3\lib\json\decoder.py in decode(self, s, _w) 335 336 """ --> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 338 end = _w(s, end).end() 339 if end != len(s): ~\anaconda3\lib\json\decoder.py in raw_decode(self, s, idx) 353 obj, end = self.scan_once(s, idx) 354 except StopIteration as err: --> 355 raise JSONDecodeError("Expecting value", s, err.value) from None 356 return obj, end JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Ir ties čia sustojame :)))) o labai gaila
from nba_api.stats.endpoints import commonallplayers
#Ištraukiame visus egzistuojančius NBA žaidėjus ir susikuriame data frame
all_players = commonallplayers.CommonAllPlayers().get_data_frames()[0]
all_players
PERSON_ID | DISPLAY_LAST_COMMA_FIRST | DISPLAY_FIRST_LAST | ROSTERSTATUS | FROM_YEAR | TO_YEAR | PLAYERCODE | PLAYER_SLUG | TEAM_ID | TEAM_CITY | TEAM_NAME | TEAM_ABBREVIATION | TEAM_CODE | TEAM_SLUG | GAMES_PLAYED_FLAG | OTHERLEAGUE_EXPERIENCE_CH | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 76001 | Abdelnaby, Alaa | Alaa Abdelnaby | 0 | 1990 | 1994 | HISTADD_alaa_abdelnaby | alaa_abdelnaby | 0 | None | Y | 00 | ||||
1 | 76002 | Abdul-Aziz, Zaid | Zaid Abdul-Aziz | 0 | 1968 | 1977 | HISTADD_zaid_abdul-aziz | zaid_abdul-aziz | 0 | None | Y | 00 | ||||
2 | 76003 | Abdul-Jabbar, Kareem | Kareem Abdul-Jabbar | 0 | 1969 | 1988 | HISTADD_kareem_abdul-jabbar | kareem_abdul-jabbar | 0 | None | Y | 00 | ||||
3 | 51 | Abdul-Rauf, Mahmoud | Mahmoud Abdul-Rauf | 0 | 1990 | 2000 | mahmoud_abdul-rauf | mahmoud_abdul-rauf | 0 | None | Y | 00 | ||||
4 | 1505 | Abdul-Wahad, Tariq | Tariq Abdul-Wahad | 0 | 1997 | 2003 | tariq_abdul-wahad | tariq_abdul-wahad | 0 | None | Y | 00 | ||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4721 | 1627790 | Zizic, Ante | Ante Zizic | 0 | 2017 | 2019 | ante_zizic | ante_zizic | 0 | None | Y | 01 | ||||
4722 | 78647 | Zoet, Jim | Jim Zoet | 0 | 1982 | 1982 | HISTADD_jim_zoet | jim_zoet | 0 | None | Y | 00 | ||||
4723 | 78648 | Zopf, Bill | Bill Zopf | 0 | 1970 | 1970 | HISTADD_zip_zopf | bill_zopf | 0 | None | Y | 00 | ||||
4724 | 1627826 | Zubac, Ivica | Ivica Zubac | 1 | 2016 | 2021 | ivica_zubac | ivica_zubac | 1610612746 | LA | Clippers | LAC | clippers | clippers | Y | 01 |
4725 | 78650 | Zunic, Matt | Matt Zunic | 0 | 1948 | 1948 | HISTADD_matt_zunic | matt_zunic | 0 | None | Y | 00 |
4726 rows × 16 columns
all_players.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4726 entries, 0 to 4725 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PERSON_ID 4726 non-null int64 1 DISPLAY_LAST_COMMA_FIRST 4726 non-null object 2 DISPLAY_FIRST_LAST 4726 non-null object 3 ROSTERSTATUS 4726 non-null int64 4 FROM_YEAR 4726 non-null object 5 TO_YEAR 4726 non-null object 6 PLAYERCODE 4725 non-null object 7 PLAYER_SLUG 4726 non-null object 8 TEAM_ID 4726 non-null int64 9 TEAM_CITY 4726 non-null object 10 TEAM_NAME 4726 non-null object 11 TEAM_ABBREVIATION 4726 non-null object 12 TEAM_CODE 4726 non-null object 13 TEAM_SLUG 575 non-null object 14 GAMES_PLAYED_FLAG 4726 non-null object 15 OTHERLEAGUE_EXPERIENCE_CH 4726 non-null object dtypes: int64(3), object(13) memory usage: 590.9+ KB
all_players['FROM_YEAR'].min()
'1946'
all_players['FROM_YEAR'] = all_players['FROM_YEAR'].astype('int64')
all_players.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4726 entries, 0 to 4725 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PERSON_ID 4726 non-null int64 1 DISPLAY_LAST_COMMA_FIRST 4726 non-null object 2 DISPLAY_FIRST_LAST 4726 non-null object 3 ROSTERSTATUS 4726 non-null int64 4 FROM_YEAR 4726 non-null int64 5 TO_YEAR 4726 non-null object 6 PLAYERCODE 4725 non-null object 7 PLAYER_SLUG 4726 non-null object 8 TEAM_ID 4726 non-null int64 9 TEAM_CITY 4726 non-null object 10 TEAM_NAME 4726 non-null object 11 TEAM_ABBREVIATION 4726 non-null object 12 TEAM_CODE 4726 non-null object 13 TEAM_SLUG 575 non-null object 14 GAMES_PLAYED_FLAG 4726 non-null object 15 OTHERLEAGUE_EXPERIENCE_CH 4726 non-null object dtypes: int64(4), object(12) memory usage: 590.9+ KB
all_players['FROM_YEAR'].min()
1946
*Duomenys nuo pat NBA įkurimo. Tačiau žaidimas ir žaidėjų statistika labai keitėsi. Sunku parinkti vieną datą, nuo kurios būtų teisingiausia imti duomenis. Pasirinkau savo gimimo metus - 1995 m.
all_players['TO_YEAR'] = all_players['TO_YEAR'].astype('int64')
all_players = all_players[all_players['TO_YEAR']>=1995]
all_players
C:\Users\Vytis\AppData\Local\Temp/ipykernel_20512/1869395000.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy all_players['TO_YEAR'] = all_players['TO_YEAR'].astype('int64')
PERSON_ID | DISPLAY_LAST_COMMA_FIRST | DISPLAY_FIRST_LAST | ROSTERSTATUS | FROM_YEAR | TO_YEAR | PLAYERCODE | PLAYER_SLUG | TEAM_ID | TEAM_CITY | TEAM_NAME | TEAM_ABBREVIATION | TEAM_CODE | TEAM_SLUG | GAMES_PLAYED_FLAG | OTHERLEAGUE_EXPERIENCE_CH | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 51 | Abdul-Rauf, Mahmoud | Mahmoud Abdul-Rauf | 0 | 1990 | 2000 | mahmoud_abdul-rauf | mahmoud_abdul-rauf | 0 | None | Y | 00 | ||||
4 | 1505 | Abdul-Wahad, Tariq | Tariq Abdul-Wahad | 0 | 1997 | 2003 | tariq_abdul-wahad | tariq_abdul-wahad | 0 | None | Y | 00 | ||||
5 | 949 | Abdur-Rahim, Shareef | Shareef Abdur-Rahim | 0 | 1996 | 2007 | shareef_abdur-rahim | shareef_abdur-rahim | 0 | None | Y | 00 | ||||
9 | 203518 | Abrines, Alex | Alex Abrines | 0 | 2016 | 2018 | alex_abrines | alex_abrines | 0 | None | Y | 00 | ||||
10 | 1630173 | Achiuwa, Precious | Precious Achiuwa | 1 | 2020 | 2021 | precious_achiuwa | precious_achiuwa | 1610612761 | Toronto | Raptors | TOR | raptors | raptors | Y | 00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4718 | 2583 | Zimmerman, Derrick | Derrick Zimmerman | 0 | 2005 | 2005 | derrick_zimmerman | derrick_zimmerman | 0 | None | Y | 01 | ||||
4719 | 1627757 | Zimmerman, Stephen | Stephen Zimmerman | 0 | 2016 | 2016 | stephen_zimmerman | stephen_zimmerman | 0 | None | Y | 01 | ||||
4720 | 1627835 | Zipser, Paul | Paul Zipser | 0 | 2016 | 2017 | paul_zipser | paul_zipser | 0 | None | Y | 01 | ||||
4721 | 1627790 | Zizic, Ante | Ante Zizic | 0 | 2017 | 2019 | ante_zizic | ante_zizic | 0 | None | Y | 01 | ||||
4724 | 1627826 | Zubac, Ivica | Ivica Zubac | 1 | 2016 | 2021 | ivica_zubac | ivica_zubac | 1610612746 | LA | Clippers | LAC | clippers | clippers | Y | 01 |
2538 rows × 16 columns
# Noted That, one may got block by website if too many queries at the same time so an one-second pause is added
# to each query to avoid that. In addition, an error log is also added since queries may still fail due to unknown issues,
# so it would be better to record missing queries so it can be re-queried later.
pip install tqdm #rodo scriptinimo procesą terminalo apačioje
Requirement already satisfied: tqdm in c:\users\vytis\anaconda3\lib\site-packages (4.62.3) Requirement already satisfied: colorama in c:\users\vytis\anaconda3\lib\site-packages (from tqdm) (0.4.4) Note: you may need to restart the kernel to use updated packages.
Išsitraukiame kiekvieno žaidėjo statistika kiekvienam sezonui
from nba_api.stats.endpoints import playercareerstats
import time
# extract all IDs
players_ID = all_players['PERSON_ID']
# initial an empty dataframe
players_stats = pd.DataFrame()
# save fail queries
error_log = []
# extracting stats
from tqdm import tqdm #prisidėt reikėjo, kitaip nesuprato kas yra tqdm
for ID in tqdm(players_ID):
try:
time.sleep(1) # avoid too many queries submitted at the same time
career = playercareerstats.PlayerCareerStats(player_id=ID)
player_career = career.get_data_frames()[0]
players_stats = pd.concat([players_stats,player_career],axis=0,ignore_index=True)
except:
error_log.append(ID)
# re-query missing IDs. Kaip supratau, kartais ne iškarto visus duomenis teisingai nuscriptina API, tai reikalingas papildomas kodas
for ID in tqdm(error_log):
try:
time.sleep(1)
career = playercareerstats.PlayerCareerStats(player_id=ID)
player_career = career.get_data_frames()[0]
players_stats = pd.concat([players_stats,player_career],axis=0,ignore_index=True)
except:
error_log.append(ID)
# save result
with open("player_stats.pkl",'wb') as f:
pickle.dump(players_stats,f)
100%|████████████████████████████████████████████████████████████████████████████| 2538/2538 [1:50:29<00:00, 2.61s/it] 0it [00:00, ?it/s]
--------------------------------------------------------------------------- NameError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_20512/2667977556.py in <module> 32 # save result 33 with open("player_stats.pkl",'wb') as f: ---> 34 pickle.dump(players_stats,f) NameError: name 'pickle' is not defined
Pickle is an 80 Times Faster Alternative Can store any Python object !!One of the most widely used functionalities is saving machine learning models after the training is complete. That way, you don’t have to retrain the model every time you run the script!!
import pickle
with open("player_stats.pkl",'wb') as f:
pickle.dump(players_stats,f)
Komandų statistika
from nba_api.stats.endpoints import teamyearbyyearstats
from nba_api.stats.static import teams
import time
import pickle
nba_teams = teams.get_teams()
teams_stats = pd.DataFrame()
error_teams =[]
from tqdm import tqdm
for i in tqdm(nba_teams):
try:
time.sleep(1)
team = teamyearbyyearstats.TeamYearByYearStats(team_id=i['id'])
team_data = team.get_data_frames()[0]
teams_stats = pd.concat([teams_stats,team_data],axis=0,ignore_index=True)
except:
error_teams.append(i)
with open("teams_stats.pkl","wb") as f:
pickle.dump(teams_stats,f)
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:51<00:00, 1.70s/it]
Žaidėjų laimėjimai per sezoną
from nba_api.stats.endpoints import playerawards
players_awards = pd.DataFrame()
error_awards = []
from tqdm import tqdm
for ID in tqdm(players_ID):
try:
time.sleep(1)
award = playerawards.PlayerAwards(player_id=ID)
award.get_data_frames()[0]
players_awards = pd.concat([players_awards,award.get_data_frames()[0]],axis=0,ignore_index=True)
except:
error_awards.append(ID)
for ID in tqdm(error_awards):
time.sleep(1)
award = playerawards.PlayerAwards(player_id=ID)
award.get_data_frames()[0]
players_awards = pd.concat([players_awards,award.get_data_frames()[0]],axis=0,ignore_index=True)
with open("players_awards.pkl","wb") as f:
pickle.dump(players_awards,f)
100%|██████████████████████████████████████████████████████████████████████████████| 2538/2538 [57:54<00:00, 1.37s/it] 0it [00:00, ?it/s]
players_stats = pd.read_pickle('C:\\Users\\Vytis\\player_stats.pkl')
players_stats.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 17974 entries, 0 to 17973 Data columns (total 27 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PLAYER_ID 17974 non-null object 1 SEASON_ID 17974 non-null object 2 LEAGUE_ID 17974 non-null object 3 TEAM_ID 17974 non-null object 4 TEAM_ABBREVIATION 17974 non-null object 5 PLAYER_AGE 17974 non-null float64 6 GP 17974 non-null object 7 GS 17974 non-null object 8 MIN 17974 non-null float64 9 FGM 17974 non-null object 10 FGA 17974 non-null object 11 FG_PCT 17974 non-null float64 12 FG3M 17966 non-null object 13 FG3A 17966 non-null object 14 FG3_PCT 17966 non-null float64 15 FTM 17974 non-null object 16 FTA 17974 non-null object 17 FT_PCT 17974 non-null float64 18 OREB 17974 non-null object 19 DREB 17974 non-null object 20 REB 17974 non-null object 21 AST 17974 non-null object 22 STL 17974 non-null object 23 BLK 17974 non-null object 24 TOV 17973 non-null object 25 PF 17974 non-null object 26 PTS 17974 non-null object dtypes: float64(5), object(22) memory usage: 3.7+ MB
teams_stats = pd.read_pickle('C:\\Users\\Vytis\\teams_stats.pkl')
teams_stats
TEAM_ID | TEAM_CITY | TEAM_NAME | YEAR | GP | WINS | LOSSES | WIN_PCT | CONF_RANK | DIV_RANK | ... | OREB | DREB | REB | AST | PF | STL | TOV | BLK | PTS | PTS_RANK | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1610612737 | Tri-Cities | Blackhawks | 1949-50 | 64 | 29 | 35 | 0.453 | 0 | 3 | ... | 0 | 0 | 0 | 1330 | 2057 | 0 | 0 | 0 | 5313 | 10 |
1 | 1610612737 | Tri-Cities | Blackhawks | 1950-51 | 68 | 25 | 43 | 0.368 | 0 | 5 | ... | 0 | 0 | 0 | 1476 | 2092 | 0 | 0 | 0 | 5730 | 3 |
2 | 1610612737 | Milwaukee | Hawks | 1951-52 | 66 | 17 | 49 | 0.258 | 0 | 5 | ... | 0 | 0 | 0 | 1229 | 1848 | 0 | 0 | 0 | 4833 | 10 |
3 | 1610612737 | Milwaukee | Hawks | 1952-53 | 71 | 27 | 44 | 0.380 | 0 | 5 | ... | 0 | 0 | 0 | 1427 | 2120 | 0 | 0 | 0 | 5389 | 9 |
4 | 1610612737 | Milwaukee | Hawks | 1953-54 | 72 | 21 | 51 | 0.292 | 0 | 4 | ... | 0 | 0 | 0 | 1298 | 1771 | 0 | 0 | 0 | 5038 | 9 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1562 | 1610612766 | Charlotte | Hornets | 2017-18 | 82 | 36 | 46 | 0.439 | 10 | 3 | ... | 827 | 2901 | 3728 | 1770 | 1409 | 559 | 1041 | 373 | 8874 | 10 |
1563 | 1610612766 | Charlotte | Hornets | 2018-19 | 82 | 39 | 43 | 0.476 | 9 | 2 | ... | 814 | 2778 | 3592 | 1905 | 1550 | 591 | 1001 | 405 | 9081 | 19 |
1564 | 1610612766 | Charlotte | Hornets | 2019-20 | 65 | 23 | 42 | 0.354 | 10 | 4 | ... | 715 | 2066 | 2781 | 1549 | 1223 | 428 | 949 | 268 | 6687 | 30 |
1565 | 1610612766 | Charlotte | Hornets | 2020-21 | 72 | 33 | 39 | 0.458 | 10 | 4 | ... | 762 | 2389 | 3151 | 1933 | 1298 | 565 | 1069 | 344 | 7881 | 23 |
1566 | 1610612766 | Charlotte | Hornets | 2021-22 | 37 | 19 | 18 | 0.514 | 7 | 2 | ... | 388 | 1222 | 1610 | 992 | 708 | 328 | 484 | 200 | 4245 | 2 |
1567 rows × 34 columns
players_awards = pd.read_pickle('C:\\Users\\Vytis\\players_awards.pkl')
players_awards
PERSON_ID | FIRST_NAME | LAST_NAME | TEAM | DESCRIPTION | ALL_NBA_TEAM_NUMBER | SEASON | MONTH | WEEK | CONFERENCE | TYPE | SUBTYPE1 | SUBTYPE2 | SUBTYPE3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 51 | Chris | Jackson | Denver Nuggets | All-Rookie Team | 2 | 1990-91 | None | None | 1610612743 | Award | Kia Motors | KIART | None |
1 | 51 | Mahmoud | Abdul-Rauf | Denver Nuggets | NBA Most Improved Player | None | 1992-93 | None | None | None | Award | Kia Motors | KIMIP | None |
2 | 949 | Shareef | Abdur-Rahim | Vancouver Grizzlies | All-Rookie Team | 1 | 1996-97 | None | None | 1610612763 | Award | Kia Motors | KIART | None |
3 | 949 | Shareef | Abdur-Rahim | Atlanta Hawks | NBA Player of the Week | None | 2001-02 | None | 2001-11-25T00:00:00 | East | Award | Kia Motors | KIPWK | None |
4 | 949 | Shareef | Abdur-Rahim | Vancouver Grizzlies | NBA Rookie of the Month | None | 1996-97 | 02/01/1997 | None | None | Award | Kia Motors | KIRMO | None |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3739 | 203092 | Tyler | Zeller | Cleveland Cavaliers | All-Rookie Team | 2 | 2012-13 | None | None | 1610612739 | Award | Kia Motors | KIART | None |
3740 | 1917 | Wang | Zhizhi | China | Olympic Appearance | None | 1996 | None | None | None | Award | Olympic | Appearance | None |
3741 | 1917 | Wang | Zhizhi | China | Olympic Appearance | None | 2000 | None | None | None | Award | Olympic | Appearance | None |
3742 | 1917 | Wang | Zhizhi | China | Olympic Appearance | None | 2008 | None | None | None | Award | Olympic | Appearance | None |
3743 | 1917 | Wang | Zhizhi | China | Olympic Appearance | None | 2012 | None | None | None | Award | Olympic | Appearance | None |
3744 rows × 14 columns
## Lentelių tvarkymas
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
import pickle
*Sujungiau player ir team lenteles, nes bendra komandos padėtis tikėtina daro įtaką žaidėjo tikimybei laimėti MVP
teams_stats.rename(columns={'YEAR':'SEASON_ID'},inplace=True)
players_teams = players_stats.merge(teams_stats,how='inner',on=['TEAM_ID','SEASON_ID'],suffixes=('_player','_team'))
players_teams
PLAYER_ID | SEASON_ID | LEAGUE_ID | TEAM_ID | TEAM_ABBREVIATION | PLAYER_AGE | GP_player | GS | MIN | FGM_player | ... | OREB_team | DREB_team | REB_team | AST_team | PF_team | STL_team | TOV_team | BLK_team | PTS_team | PTS_RANK | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 51 | 1990-91 | 00 | 1610612743 | DEN | 22.0 | 67 | 19 | 1505.0 | 417 | ... | 1520 | 2530 | 4050 | 2005 | 2235 | 856 | 1332 | 406 | 9828 | 1 |
1 | 149 | 1990-91 | 00 | 1610612743 | DEN | 28.0 | 66 | 66 | 2346.0 | 560 | ... | 1520 | 2530 | 4050 | 2005 | 2235 | 856 | 1332 | 406 | 9828 | 1 |
2 | 246 | 1990-91 | 00 | 1610612743 | DEN | 27.0 | 41 | 2 | 659.0 | 85 | ... | 1520 | 2530 | 4050 | 2005 | 2235 | 856 | 1332 | 406 | 9828 | 1 |
3 | 76433 | 1990-91 | 00 | 1610612743 | DEN | 24.0 | 58 | 25 | 1121.0 | 118 | ... | 1520 | 2530 | 4050 | 2005 | 2235 | 856 | 1332 | 406 | 9828 | 1 |
4 | 422 | 1990-91 | 00 | 1610612743 | DEN | 26.0 | 21 | 4 | 217.0 | 29 | ... | 1520 | 2530 | 4050 | 2005 | 2235 | 856 | 1332 | 406 | 9828 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
16395 | 433 | 1983-84 | 00 | 1610612751 | NJN | 24.0 | 81 | 81 | 3003.0 | 495 | ... | 1221 | 2313 | 3534 | 2148 | 2243 | 814 | 1608 | 499 | 9019 | 13 |
16396 | 433 | 1987-88 | 00 | 1610612751 | NJN | 28.0 | 70 | 70 | 2637.0 | 466 | ... | 1075 | 2262 | 3337 | 1795 | 2042 | 727 | 1503 | 385 | 8235 | 22 |
16397 | 1006 | 1981-82 | 00 | 1610612754 | IND | 24.0 | 82 | 75 | 2277.0 | 407 | ... | 1141 | 2372 | 3513 | 1897 | 2041 | 753 | 1393 | 494 | 8379 | 22 |
16398 | 1006 | 1982-83 | 00 | 1610612754 | IND | 25.0 | 78 | 74 | 2513.0 | 580 | ... | 1299 | 2294 | 3593 | 2150 | 2086 | 755 | 1535 | 411 | 8911 | 12 |
16399 | 1006 | 1983-84 | 00 | 1610612754 | IND | 26.0 | 69 | 53 | 2279.0 | 411 | ... | 1002 | 2398 | 3400 | 2169 | 2061 | 834 | 1525 | 398 | 8566 | 19 |
16400 rows × 59 columns
# Papildomi argumentai - kiek kartų per sezoną laimėjo savaitės ar mėnesio prizus
week = players_awards[players_awards['DESCRIPTION']=='NBA Player of the Week'].groupby(['PERSON_ID','SEASON'])
month = players_awards[players_awards['DESCRIPTION']=='NBA Player of the Month'].groupby(['PERSON_ID','SEASON'])
player_of_week = []
player_of_month = []
for i in tqdm(range(len(players_teams))):
player_id = players_teams.loc[i,'PLAYER_ID']
season_id = players_teams.loc[i,'SEASON_ID']
try:
player_of_month.append(month.get_group((player_id,season_id))['PERSON_ID'].count())
except:
player_of_month.append(0)
try:
player_of_week.append(week.get_group((player_id,season_id))['PERSON_ID'].count())
except:
player_of_week.append(0)
players_teams['player_of_week'] = player_of_week
players_teams['player_of_month'] = player_of_month
100%|█████████████████████████████████████████████████████████████████████████| 16400/16400 [00:01<00:00, 14564.11it/s]
# Svarbi detalė - ar laimėjo žaidėjas MVP ar ne
mvp = players_awards[players_awards['DESCRIPTION']=='NBA Most Valuable Player'].rename(columns={"PERSON_ID":"PLAYER_ID","SEASON":"SEASON_ID","TYPE":"MVP"})
df = players_teams.merge(mvp.loc[:,["PLAYER_ID","SEASON_ID","MVP"]],how="left",on=["PLAYER_ID","SEASON_ID"])
df["MVP"] = df.MVP.map({"Award":1,np.NaN:0}) #jei laimėjo tais metais MVP - 1, jei ne - 0
df
PLAYER_ID | SEASON_ID | LEAGUE_ID | TEAM_ID | TEAM_ABBREVIATION | PLAYER_AGE | GP_player | GS | MIN | FGM_player | ... | AST_team | PF_team | STL_team | TOV_team | BLK_team | PTS_team | PTS_RANK | player_of_week | player_of_month | MVP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 51 | 1990-91 | 00 | 1610612743 | DEN | 22.0 | 67 | 19 | 1505.0 | 417 | ... | 2005 | 2235 | 856 | 1332 | 406 | 9828 | 1 | 0 | 0 | 0 |
1 | 149 | 1990-91 | 00 | 1610612743 | DEN | 28.0 | 66 | 66 | 2346.0 | 560 | ... | 2005 | 2235 | 856 | 1332 | 406 | 9828 | 1 | 1 | 0 | 0 |
2 | 246 | 1990-91 | 00 | 1610612743 | DEN | 27.0 | 41 | 2 | 659.0 | 85 | ... | 2005 | 2235 | 856 | 1332 | 406 | 9828 | 1 | 0 | 0 | 0 |
3 | 76433 | 1990-91 | 00 | 1610612743 | DEN | 24.0 | 58 | 25 | 1121.0 | 118 | ... | 2005 | 2235 | 856 | 1332 | 406 | 9828 | 1 | 0 | 0 | 0 |
4 | 422 | 1990-91 | 00 | 1610612743 | DEN | 26.0 | 21 | 4 | 217.0 | 29 | ... | 2005 | 2235 | 856 | 1332 | 406 | 9828 | 1 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
16395 | 433 | 1983-84 | 00 | 1610612751 | NJN | 24.0 | 81 | 81 | 3003.0 | 495 | ... | 2148 | 2243 | 814 | 1608 | 499 | 9019 | 13 | 1 | 0 | 0 |
16396 | 433 | 1987-88 | 00 | 1610612751 | NJN | 28.0 | 70 | 70 | 2637.0 | 466 | ... | 1795 | 2042 | 727 | 1503 | 385 | 8235 | 22 | 0 | 0 | 0 |
16397 | 1006 | 1981-82 | 00 | 1610612754 | IND | 24.0 | 82 | 75 | 2277.0 | 407 | ... | 1897 | 2041 | 753 | 1393 | 494 | 8379 | 22 | 0 | 0 | 0 |
16398 | 1006 | 1982-83 | 00 | 1610612754 | IND | 25.0 | 78 | 74 | 2513.0 | 580 | ... | 2150 | 2086 | 755 | 1535 | 411 | 8911 | 12 | 0 | 0 | 0 |
16399 | 1006 | 1983-84 | 00 | 1610612754 | IND | 26.0 | 69 | 53 | 2279.0 | 411 | ... | 2169 | 2061 | 834 | 1525 | 398 | 8566 | 19 | 0 | 0 | 0 |
16400 rows × 62 columns
df.to_csv('nba_stats.csv')
with open("df.pkl",'wb') as f:
pickle.dump(df,f)
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 16400 entries, 0 to 16399 Data columns (total 59 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PLAYER_ID 16400 non-null object 1 SEASON_ID 16400 non-null int64 2 PLAYER_AGE 16400 non-null float64 3 GP_player 16400 non-null object 4 GS 16400 non-null object 5 MIN 16400 non-null float64 6 FGM_player 16400 non-null object 7 FGA_player 16400 non-null object 8 FG_PCT_player 16400 non-null float64 9 FG3M_player 16393 non-null object 10 FG3A_player 16393 non-null object 11 FG3_PCT_player 16393 non-null float64 12 FTM_player 16400 non-null object 13 FTA_player 16400 non-null object 14 FT_PCT_player 16400 non-null float64 15 OREB_player 16400 non-null object 16 DREB_player 16400 non-null object 17 REB_player 16400 non-null object 18 AST_player 16400 non-null object 19 STL_player 16400 non-null object 20 BLK_player 16400 non-null object 21 TOV_player 16399 non-null object 22 PF_player 16400 non-null object 23 PTS_player 16400 non-null object 24 TEAM_CITY 16400 non-null object 25 TEAM_NAME 16400 non-null object 26 GP_team 16400 non-null int64 27 WINS 16400 non-null int64 28 LOSSES 16400 non-null int64 29 WIN_PCT 16400 non-null float64 30 CONF_RANK 16400 non-null int64 31 DIV_RANK 16400 non-null int64 32 PO_WINS 16400 non-null int64 33 PO_LOSSES 16400 non-null int64 34 CONF_COUNT 16400 non-null float64 35 DIV_COUNT 16400 non-null int64 36 NBA_FINALS_APPEARANCE 16400 non-null object 37 FGM_team 16400 non-null int64 38 FGA_team 16400 non-null int64 39 FG_PCT_team 16400 non-null float64 40 FG3M_team 16400 non-null int64 41 FG3A_team 16400 non-null int64 42 FG3_PCT_team 16400 non-null float64 43 FTM_team 16400 non-null int64 44 FTA_team 16400 non-null int64 45 FT_PCT_team 16400 non-null float64 46 OREB_team 16400 non-null int64 47 DREB_team 16400 non-null int64 48 REB_team 16400 non-null int64 49 AST_team 16400 non-null int64 50 PF_team 16400 non-null int64 51 STL_team 16400 non-null int64 52 TOV_team 16400 non-null int64 53 BLK_team 16400 non-null int64 54 PTS_team 16400 non-null int64 55 PTS_RANK 16400 non-null int64 56 player_of_week 16400 non-null int64 57 player_of_month 16400 non-null int64 58 MVP 16400 non-null int64 dtypes: float64(10), int64(28), object(21) memory usage: 7.5+ MB
df = pd.read_pickle('df.pkl')
# panaikinti pasikartojančius stulpelius
df.drop(columns=["LEAGUE_ID","TEAM_ID","TEAM_ABBREVIATION"],inplace=True)
# pakeisti sezono stulpelio tipa i int,
df['SEASON_ID'] = df['SEASON_ID'].map(lambda x: int(x.split("-",1)[0]))
num_cols = ['PLAYER_AGE', 'GP', 'GS', 'MIN', 'FGM', 'FGA', 'FG_PCT', 'FG3M', 'FG3A',
'FG3_PCT', 'FTM', 'FTA', 'FT_PCT', 'OREB', 'DREB', 'REB', 'AST', 'STL',
'BLK', 'TOV', 'PF', 'PTS', 'WIN_PCT', 'CONF_RANK', 'DIV_RANK',
'player_of_week', 'player_of_month']
pip install scikit-learn
Requirement already satisfied: scikit-learn in c:\users\vytis\anaconda3\lib\site-packages (1.0.2) Requirement already satisfied: scipy>=1.1.0 in c:\users\vytis\anaconda3\lib\site-packages (from scikit-learn) (1.7.1) Requirement already satisfied: joblib>=0.11 in c:\users\vytis\anaconda3\lib\site-packages (from scikit-learn) (1.1.0) Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\vytis\anaconda3\lib\site-packages (from scikit-learn) (2.2.0) Requirement already satisfied: numpy>=1.14.6 in c:\users\vytis\anaconda3\lib\site-packages (from scikit-learn) (1.20.3) Note: you may need to restart the kernel to use updated packages.
# Pasidaryti data season by season principą. Nesigavo
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
for year in df["SEASON_ID"].unique().tolist():
for col in num_cols:
df.loc[df['SEASON_ID']==year, col] = scaler.fit_transform(df.loc[df['SEASON_ID']==year, col].to_numpy().reshape(-1,1))
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 3360 try: -> 3361 return self._engine.get_loc(casted_key) 3362 except KeyError as err: ~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() ~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'GP' The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_21052/2661967775.py in <module> 4 for year in df["SEASON_ID"].unique().tolist(): 5 for col in num_cols: ----> 6 df.loc[df['SEASON_ID']==year, col] = scaler.fit_transform(df.loc[df['SEASON_ID']==year, col].to_numpy().reshape(-1,1)) ~\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key) 923 with suppress(KeyError, IndexError): 924 return self.obj._get_value(*key, takeable=self._takeable) --> 925 return self._getitem_tuple(key) 926 else: 927 # we by definition only have the 0th axis ~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup) 1098 def _getitem_tuple(self, tup: tuple): 1099 with suppress(IndexingError): -> 1100 return self._getitem_lowerdim(tup) 1101 1102 # no multi-index, so validate all of the indexers ~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_lowerdim(self, tup) 836 # We don't need to check for tuples here because those are 837 # caught by the _is_nested_tuple_indexer check above. --> 838 section = self._getitem_axis(key, axis=i) 839 840 # We should never have a scalar section here, because ~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis) 1162 # fall thru to straight lookup 1163 self._validate_key(key, axis) -> 1164 return self._get_label(key, axis=axis) 1165 1166 def _get_slice_axis(self, slice_obj: slice, axis: int): ~\anaconda3\lib\site-packages\pandas\core\indexing.py in _get_label(self, label, axis) 1111 def _get_label(self, label, axis: int): 1112 # GH#5667 this will fail if the label is not present in the axis. -> 1113 return self.obj.xs(label, axis=axis) 1114 1115 def _handle_lowerdim_multi_index_axis0(self, tup: tuple): ~\anaconda3\lib\site-packages\pandas\core\generic.py in xs(self, key, axis, level, drop_level) 3759 if axis == 1: 3760 if drop_level: -> 3761 return self[key] 3762 index = self.columns 3763 else: ~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key) 3456 if self.columns.nlevels > 1: 3457 return self._getitem_multilevel(key) -> 3458 indexer = self.columns.get_loc(key) 3459 if is_integer(indexer): 3460 indexer = [indexer] ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 3361 return self._engine.get_loc(casted_key) 3362 except KeyError as err: -> 3363 raise KeyError(key) from err 3364 3365 if is_scalar(key) and isna(key) and not self.hasnans: KeyError: 'GP'
train = df.loc[(df['SEASON_ID']>=1995)&(df['SEASON_ID']<=2020),:]
test = df.loc[(df['SEASON_ID']==2021),:]
y = train['MVP']
X = train.iloc[:,:-1]
y_test = test['MVP']
X_test = test.iloc[:,:-1]
print(y.value_counts())
y.value_counts().plot(kind='bar')
0 13640 1 26 Name: MVP, dtype: int64
<AxesSubplot:>
corrmat = df.corr()
k = 10 # show top k most correlated features
cols = corrmat.nlargest(k, 'MVP')['MVP'].index
cm = np.corrcoef(df[cols].values.T)
sns.set(font_scale=1.0)
hm = sns.heatmap(cm, cbar=True, annot=True, square=True, fmt='.2f', annot_kws={'size': 8}, yticklabels=cols.values, xticklabels=cols.values)
plt.show()
# drop correlated feautures
X_corr = X.corr()
corr_names = set()
for i in range(len(X_corr .columns)):
for j in range(i):
if abs(X_corr.iloc[i, j]) > 0.8:
col = X_corr.columns[i]
corr_names.add(col)
X.drop(columns=corr_names,inplace=True)
X_test.drop(columns=corr_names,inplace=True)
features = ['PLAYER_AGE', 'GP_player', 'GS', 'MIN', 'FGM_player', 'FGA_player',
'FG_PCT_player', 'FG3M_player', 'FG3A_player', 'FG3_PCT_player',
'FTM_player', 'FTA_player', 'FT_PCT_player', 'OREB_player',
'DREB_player', 'REB_player', 'AST_player', 'STL_player', 'BLK_player',
'TOV_player', 'PF_player', 'PTS_player', 'WINS','player_of_week', 'player_of_month']
pip install statsmodels
Requirement already satisfied: statsmodels in c:\users\vytis\anaconda3\lib\site-packages (0.13.1) Requirement already satisfied: pandas>=0.25 in c:\users\vytis\anaconda3\lib\site-packages (from statsmodels) (1.3.4) Requirement already satisfied: numpy>=1.17 in c:\users\vytis\anaconda3\lib\site-packages (from statsmodels) (1.20.3) Requirement already satisfied: patsy>=0.5.2 in c:\users\vytis\anaconda3\lib\site-packages (from statsmodels) (0.5.2) Requirement already satisfied: scipy>=1.3 in c:\users\vytis\anaconda3\lib\site-packages (from statsmodels) (1.7.1) Requirement already satisfied: pytz>=2017.3 in c:\users\vytis\anaconda3\lib\site-packages (from pandas>=0.25->statsmodels) (2021.3) Requirement already satisfied: python-dateutil>=2.7.3 in c:\users\vytis\anaconda3\lib\site-packages (from pandas>=0.25->statsmodels) (2.8.2) Requirement already satisfied: six in c:\users\vytis\anaconda3\lib\site-packages (from patsy>=0.5.2->statsmodels) (1.16.0) Note: you may need to restart the kernel to use updated packages.
WARNING: Ignoring invalid distribution -tatsmodels (c:\users\vytis\anaconda3\lib\site-packages) WARNING: Ignoring invalid distribution -tatsmodels (c:\users\vytis\anaconda3\lib\site-packages) WARNING: Ignoring invalid distribution -tatsmodels (c:\users\vytis\anaconda3\lib\site-packages) WARNING: Ignoring invalid distribution -tatsmodels (c:\users\vytis\anaconda3\lib\site-packages) WARNING: Ignoring invalid distribution -tatsmodels (c:\users\vytis\anaconda3\lib\site-packages) WARNING: Ignoring invalid distribution -tatsmodels (c:\users\vytis\anaconda3\lib\site-packages)
import statsmodels.api as sm
X_logit = X.drop(columns=['PLAYER_ID','SEASON_ID'])
ols = sm.Logit(y,X_logit.astype('float'))
result = ols.fit()
p_values = result.summary2().tables[1]['P>|z|']
p_values = pd.Series(p_values).sort_values(ascending = True)
p_values[p_values<=0.05]
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_21052/4172156579.py in <module> 1 import statsmodels.api as sm 2 X_logit = X.drop(columns=['PLAYER_ID','SEASON_ID']) ----> 3 ols = sm.Logit(y,X_logit.astype('float')) 4 result = ols.fit() 5 p_values = result.summary2().tables[1]['P>|z|'] ~\anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors) 5813 else: 5814 # else, only a single dtype is given -> 5815 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors) 5816 return self._constructor(new_data).__finalize__(self, method="astype") 5817 ~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in astype(self, dtype, copy, errors) 416 417 def astype(self: T, dtype, copy: bool = False, errors: str = "raise") -> T: --> 418 return self.apply("astype", dtype=dtype, copy=copy, errors=errors) 419 420 def convert( ~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in apply(self, f, align_keys, ignore_failures, **kwargs) 325 applied = b.apply(f, **kwargs) 326 else: --> 327 applied = getattr(b, f)(**kwargs) 328 except (TypeError, NotImplementedError): 329 if not ignore_failures: ~\anaconda3\lib\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors) 589 values = self.values 590 --> 591 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors) 592 593 new_values = maybe_coerce_values(new_values) ~\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_array_safe(values, dtype, copy, errors) 1307 1308 try: -> 1309 new_values = astype_array(values, dtype, copy=copy) 1310 except (ValueError, TypeError): 1311 # e.g. astype_nansafe can fail on object-dtype of strings ~\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_array(values, dtype, copy) 1255 1256 else: -> 1257 values = astype_nansafe(values, dtype, copy=copy) 1258 1259 # in pandas we don't store numpy str dtypes, so convert to object ~\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy, skipna) 1093 if arr.ndim > 1: 1094 flat = arr.ravel() -> 1095 result = astype_nansafe(flat, dtype, copy=copy, skipna=skipna) 1096 # error: Item "ExtensionArray" of "Union[ExtensionArray, ndarray]" has no 1097 # attribute "reshape" ~\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy, skipna) 1199 if copy or is_object_dtype(arr.dtype) or is_object_dtype(dtype): 1200 # Explicit copy, or required since NumPy can't view from / to object. -> 1201 return arr.astype(dtype, copy=True) 1202 1203 return arr.astype(dtype, copy=copy) ValueError: could not convert string to float: 'Denver'
X_logit.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 13666 entries, 52 to 16289 Data columns (total 43 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PLAYER_AGE 13666 non-null float64 1 GP_player 13666 non-null object 2 GS 13666 non-null object 3 MIN 13666 non-null float64 4 FGM_player 13666 non-null object 5 FGA_player 13666 non-null object 6 FG_PCT_player 13666 non-null float64 7 FG3M_player 13666 non-null object 8 FG3A_player 13666 non-null object 9 FG3_PCT_player 13666 non-null float64 10 FTM_player 13666 non-null object 11 FTA_player 13666 non-null object 12 FT_PCT_player 13666 non-null float64 13 OREB_player 13666 non-null object 14 DREB_player 13666 non-null object 15 REB_player 13666 non-null object 16 AST_player 13666 non-null object 17 STL_player 13666 non-null object 18 BLK_player 13666 non-null object 19 TOV_player 13666 non-null object 20 PF_player 13666 non-null object 21 PTS_player 13666 non-null object 22 TEAM_CITY 13666 non-null object 23 TEAM_NAME 13666 non-null object 24 GP_team 13666 non-null int64 25 WINS 13666 non-null int64 26 PO_WINS 13666 non-null int64 27 CONF_COUNT 13666 non-null float64 28 DIV_COUNT 13666 non-null int64 29 NBA_FINALS_APPEARANCE 13666 non-null object 30 FG_PCT_team 13666 non-null float64 31 FG3M_team 13666 non-null int64 32 FG3_PCT_team 13666 non-null float64 33 FTM_team 13666 non-null int64 34 FT_PCT_team 13666 non-null float64 35 OREB_team 13666 non-null int64 36 PF_team 13666 non-null int64 37 STL_team 13666 non-null int64 38 TOV_team 13666 non-null int64 39 BLK_team 13666 non-null int64 40 PTS_RANK 13666 non-null int64 41 player_of_week 13666 non-null int64 42 player_of_month 13666 non-null int64 dtypes: float64(9), int64(14), object(20) memory usage: 4.6+ MB
mkdir nba-sql
cd nba-sql
C:\Users\Vytis\nba-sql
python -m venv venv
File "C:\Users\Vytis\AppData\Local\Temp/ipykernel_20512/717537615.py", line 1 python -m venv venv ^ SyntaxError: invalid syntax
$ source venv/bin/activate
File "C:\Users\Vytis\AppData\Local\Temp/ipykernel_20512/1330130224.py", line 1 $ source venv/bin/activate ^ SyntaxError: invalid syntax